Extracting Paraphrases of Japanese Action Word of Sentence Ending Part from Web and Mobile News Articles
نویسندگان
چکیده
In this research, we extract paraphrases from Japanese Web news articles that are long and aimed at displaying on personal computer screens and mobile news articles that are short and compact and aimed at mobile terminals’ small screens. We have collected them for more than two years, and aligned them at article level and then at sentence level. As the result, we got more than 88,000 pairs of aligned sentences. Next, we extract paraphrases of the final part of sentences from this aligned corpus. The paraphrases that we try to extract are the sentence final nouns of mobile article sentences and their counterpart expressions of Web article sentences. We extract character strings and word sequencies for paraphrases based on branching factor, frequency and length of string. The precision is 90% for highest ranked candidate and 83% to 59% for each top three candidates of 100 most frequently used action nouns.
منابع مشابه
Extracting Paraphrases of Japanese Sentence Ending Part From Web and Mobile News Articles
In this research, we extract paraphrases from Japanese Web news articles that are long and aimed at displaying on personal computer screens and mobile news articles that are short and compact and aimed at mobile terminals’ small screens. We have collected them for more than two years, and aligned them at article level and then at sentence level. As the result, we got more than 88,000 pairs of a...
متن کاملTerminal Device Oriented Comparable Corpora and its Alignment- Towards Extracting Paraphrasing Patterns
Many terminal devices for mobile environment such as mobile phones have small and low resolution screens compared to the big and high resolution screen of personal computers. In this circumstance, Web pages for ordinary personal computer and mobile phones written in the same language are developed separately even though they describe the same topic or contents. In this research, we collected We...
متن کاملParaphrasing Headlines by Machine Translation
In this paper we investigate the automatic collection, generation and evaluation of sentential paraphrases. Valuable sources of paraphrases are news article headlines; they tend to describe the same event in various different ways, and can easily be obtained from the web. We describe a method for generating paraphrases by using a large aligned monolingual corpus of news headlines acquired autom...
متن کاملAutomatic Paraphrase Acquisition from News Articles
Paraphrases play an important role in the variety and complexity of natural language documents. However they adds to the difficulty of natural language processing. Here we describe a procedure for obtaining paraphrases from news article. A set of paraphrases can be useful for various kinds of applications. Articles derived from different newspapers can contain paraphrases if they report the sam...
متن کاملUnsupervised Construction of Large Paraphrase Corpora: Exploiting Massively Parallel News Sources
We investigate unsupervised techniques for acquiring monolingual sentence-level paraphrases from a corpus of temporally and topically clustered news articles collected from thousands of web-based news sources. Two techniques are employed: (1) simple string edit distance, and (2) a heuristic strategy that pairs initial (presumably summary) sentences from different news stories in the same cluste...
متن کامل